Overview

Dataset statistics

Number of variables11
Number of observations462
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory39.8 KiB
Average record size in memory88.3 B

Variable types

Numeric9
Categorical2

Alerts

adiposity is highly correlated with obesity and 1 other fieldsHigh correlation
obesity is highly correlated with adiposityHigh correlation
age is highly correlated with adiposityHigh correlation
names is uniformly distributed Uniform
names has unique values Unique
tobacco has 107 (23.2%) zeros Zeros
alcohol has 110 (23.8%) zeros Zeros

Reproduction

Analysis started2022-11-01 20:23:38.782649
Analysis finished2022-11-01 20:23:59.641655
Duration20.86 seconds
Software versionpandas-profiling v3.4.0
Download configurationconfig.json

Variables

names
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct462
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean231.9350649
Minimum1
Maximum463
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.7 KiB
2022-11-01T16:23:59.837650image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile24.05
Q1116.25
median231.5
Q3347.75
95-th percentile439.95
Maximum463
Range462
Interquartile range (IQR)231.5

Descriptive statistics

Standard deviation133.9385851
Coefficient of variation (CV)0.5774831207
Kurtosis-1.203538107
Mean231.9350649
Median Absolute Deviation (MAD)116
Skewness0.001436279421
Sum107154
Variance17939.54458
MonotonicityStrictly increasing
2022-11-01T16:24:00.066658image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11
 
0.2%
3191
 
0.2%
3171
 
0.2%
3161
 
0.2%
3151
 
0.2%
3141
 
0.2%
3131
 
0.2%
3121
 
0.2%
3111
 
0.2%
3101
 
0.2%
Other values (452)452
97.8%
ValueCountFrequency (%)
11
0.2%
21
0.2%
31
0.2%
41
0.2%
51
0.2%
61
0.2%
71
0.2%
81
0.2%
91
0.2%
101
0.2%
ValueCountFrequency (%)
4631
0.2%
4621
0.2%
4611
0.2%
4601
0.2%
4591
0.2%
4581
0.2%
4571
0.2%
4561
0.2%
4551
0.2%
4541
0.2%

sbp
Real number (ℝ≥0)

Distinct62
Distinct (%)13.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean138.3268398
Minimum101
Maximum218
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.7 KiB
2022-11-01T16:24:00.287654image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum101
5-th percentile112
Q1124
median134
Q3148
95-th percentile176
Maximum218
Range117
Interquartile range (IQR)24

Descriptive statistics

Standard deviation20.49631718
Coefficient of variation (CV)0.1481731037
Kurtosis1.781646545
Mean138.3268398
Median Absolute Deviation (MAD)12
Skewness1.180590625
Sum63907
Variance420.0990178
MonotonicityNot monotonic
2022-11-01T16:24:00.591654image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13629
 
6.3%
13429
 
6.3%
12825
 
5.4%
13224
 
5.2%
11821
 
4.5%
12421
 
4.5%
13020
 
4.3%
12620
 
4.3%
13818
 
3.9%
12217
 
3.7%
Other values (52)238
51.5%
ValueCountFrequency (%)
1011
 
0.2%
1021
 
0.2%
1031
 
0.2%
1063
 
0.6%
1087
1.5%
1091
 
0.2%
1104
 
0.9%
1127
1.5%
11412
2.6%
1168
1.7%
ValueCountFrequency (%)
2181
 
0.2%
2161
 
0.2%
2141
 
0.2%
2083
0.6%
2062
0.4%
2001
 
0.2%
1981
 
0.2%
1942
0.4%
1902
0.4%
1881
 
0.2%

tobacco
Real number (ℝ≥0)

ZEROS

Distinct214
Distinct (%)46.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.635649351
Minimum0
Maximum31.2
Zeros107
Zeros (%)23.2%
Negative0
Negative (%)0.0%
Memory size3.7 KiB
2022-11-01T16:24:00.879654image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.0525
median2
Q35.5
95-th percentile12.49
Maximum31.2
Range31.2
Interquartile range (IQR)5.4475

Descriptive statistics

Standard deviation4.593024078
Coefficient of variation (CV)1.263329776
Kurtosis5.968107866
Mean3.635649351
Median Absolute Deviation (MAD)2
Skewness2.079209667
Sum1679.67
Variance21.09587018
MonotonicityNot monotonic
2022-11-01T16:24:01.112650image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0107
 
23.2%
611
 
2.4%
310
 
2.2%
0.48
 
1.7%
48
 
1.7%
4.57
 
1.5%
4.27
 
1.5%
125
 
1.1%
0.65
 
1.1%
25
 
1.1%
Other values (204)289
62.6%
ValueCountFrequency (%)
0107
23.2%
0.011
 
0.2%
0.021
 
0.2%
0.031
 
0.2%
0.042
 
0.4%
0.054
 
0.9%
0.061
 
0.2%
0.071
 
0.2%
0.082
 
0.4%
0.091
 
0.2%
ValueCountFrequency (%)
31.21
0.2%
27.41
0.2%
25.011
0.2%
202
0.4%
19.61
0.2%
19.451
0.2%
19.21
0.2%
18.21
0.2%
181
0.2%
161
0.2%

ldl
Real number (ℝ≥0)

Distinct329
Distinct (%)71.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.740324675
Minimum0.98
Maximum15.33
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.7 KiB
2022-11-01T16:24:01.373650image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0.98
5-th percentile2.1945
Q13.2825
median4.34
Q35.79
95-th percentile8.404
Maximum15.33
Range14.35
Interquartile range (IQR)2.5075

Descriptive statistics

Standard deviation2.070909161
Coefficient of variation (CV)0.4368707426
Kurtosis2.876552943
Mean4.740324675
Median Absolute Deviation (MAD)1.195
Skewness1.31310398
Sum2190.03
Variance4.288664753
MonotonicityNot monotonic
2022-11-01T16:24:01.600651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4.375
 
1.1%
3.955
 
1.1%
3.575
 
1.1%
3.584
 
0.9%
2.44
 
0.9%
4.164
 
0.9%
3.34
 
0.9%
5.633
 
0.6%
3.23
 
0.6%
3.173
 
0.6%
Other values (319)422
91.3%
ValueCountFrequency (%)
0.981
0.2%
1.071
0.2%
1.431
0.2%
1.551
0.2%
1.591
0.2%
1.711
0.2%
1.721
0.2%
1.741
0.2%
1.771
0.2%
1.81
0.2%
ValueCountFrequency (%)
15.331
0.2%
14.161
0.2%
12.421
0.2%
11.891
0.2%
11.611
0.2%
11.411
0.2%
11.321
0.2%
11.171
0.2%
10.581
0.2%
10.531
0.2%

adiposity
Real number (ℝ≥0)

HIGH CORRELATION

Distinct408
Distinct (%)88.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean25.4067316
Minimum6.74
Maximum42.49
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.7 KiB
2022-11-01T16:24:01.844651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum6.74
5-th percentile12.0065
Q119.775
median26.115
Q331.2275
95-th percentile37.1165
Maximum42.49
Range35.75
Interquartile range (IQR)11.4525

Descriptive statistics

Standard deviation7.780698596
Coefficient of variation (CV)0.306245554
Kurtosis-0.6984386244
Mean25.4067316
Median Absolute Deviation (MAD)5.7
Skewness-0.2146459286
Sum11737.91
Variance60.53927064
MonotonicityNot monotonic
2022-11-01T16:24:02.066652image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30.793
 
0.6%
27.553
 
0.6%
21.13
 
0.6%
29.33
 
0.6%
24.652
 
0.4%
29.182
 
0.4%
30.92
 
0.4%
30.112
 
0.4%
26.082
 
0.4%
32.032
 
0.4%
Other values (398)438
94.8%
ValueCountFrequency (%)
6.741
0.2%
7.121
0.2%
8.661
0.2%
9.281
0.2%
9.371
0.2%
9.391
0.2%
9.641
0.2%
9.692
0.4%
9.741
0.2%
10.051
0.2%
ValueCountFrequency (%)
42.491
0.2%
42.171
0.2%
42.061
0.2%
41.051
0.2%
40.61
0.2%
39.971
0.2%
39.711
0.2%
39.681
0.2%
39.661
0.2%
39.641
0.2%

famhist
Categorical

Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size3.7 KiB
Absent
270 
Present
192 

Length

Max length7
Median length6
Mean length6.415584416
Min length6

Characters and Unicode

Total characters2964
Distinct characters8
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPresent
2nd rowAbsent
3rd rowPresent
4th rowPresent
5th rowPresent

Common Values

ValueCountFrequency (%)
Absent270
58.4%
Present192
41.6%

Length

2022-11-01T16:24:02.283650image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-01T16:24:02.470656image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
absent270
58.4%
present192
41.6%

Most occurring characters

ValueCountFrequency (%)
e654
22.1%
s462
15.6%
n462
15.6%
t462
15.6%
A270
9.1%
b270
9.1%
P192
 
6.5%
r192
 
6.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2502
84.4%
Uppercase Letter462
 
15.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e654
26.1%
s462
18.5%
n462
18.5%
t462
18.5%
b270
10.8%
r192
 
7.7%
Uppercase Letter
ValueCountFrequency (%)
A270
58.4%
P192
41.6%

Most occurring scripts

ValueCountFrequency (%)
Latin2964
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e654
22.1%
s462
15.6%
n462
15.6%
t462
15.6%
A270
9.1%
b270
9.1%
P192
 
6.5%
r192
 
6.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII2964
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e654
22.1%
s462
15.6%
n462
15.6%
t462
15.6%
A270
9.1%
b270
9.1%
P192
 
6.5%
r192
 
6.5%

typea
Real number (ℝ≥0)

Distinct54
Distinct (%)11.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean53.1038961
Minimum13
Maximum78
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.7 KiB
2022-11-01T16:24:02.645656image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum13
5-th percentile36
Q147
median53
Q360
95-th percentile69
Maximum78
Range65
Interquartile range (IQR)13

Descriptive statistics

Standard deviation9.817534116
Coefficient of variation (CV)0.1848740834
Kurtosis0.4704023399
Mean53.1038961
Median Absolute Deviation (MAD)6
Skewness-0.3464377547
Sum24534
Variance96.38397611
MonotonicityNot monotonic
2022-11-01T16:24:03.106652image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5225
 
5.4%
5723
 
5.0%
5421
 
4.5%
5021
 
4.5%
4920
 
4.3%
6018
 
3.9%
5618
 
3.9%
5517
 
3.7%
6117
 
3.7%
4717
 
3.7%
Other values (44)265
57.4%
ValueCountFrequency (%)
131
 
0.2%
201
 
0.2%
251
 
0.2%
261
 
0.2%
281
 
0.2%
291
 
0.2%
302
0.4%
312
0.4%
321
 
0.2%
334
0.9%
ValueCountFrequency (%)
781
 
0.2%
771
 
0.2%
751
 
0.2%
742
 
0.4%
732
 
0.4%
724
0.9%
712
 
0.4%
705
1.1%
697
1.5%
686
1.3%

obesity
Real number (ℝ≥0)

HIGH CORRELATION

Distinct400
Distinct (%)86.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26.04411255
Minimum14.7
Maximum46.58
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.7 KiB
2022-11-01T16:24:03.342652image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum14.7
5-th percentile20.17
Q122.985
median25.805
Q328.4975
95-th percentile33.138
Maximum46.58
Range31.88
Interquartile range (IQR)5.5125

Descriptive statistics

Standard deviation4.213680227
Coefficient of variation (CV)0.161790125
Kurtosis2.255971618
Mean26.04411255
Median Absolute Deviation (MAD)2.71
Skewness0.9052194041
Sum12032.38
Variance17.75510105
MonotonicityNot monotonic
2022-11-01T16:24:03.564651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
24.864
 
0.9%
26.094
 
0.9%
22.513
 
0.6%
21.943
 
0.6%
24.983
 
0.6%
22.013
 
0.6%
27.293
 
0.6%
28.43
 
0.6%
25.993
 
0.6%
22.593
 
0.6%
Other values (390)430
93.1%
ValueCountFrequency (%)
14.71
0.2%
17.751
0.2%
17.811
0.2%
17.891
0.2%
18.361
0.2%
18.461
0.2%
18.51
0.2%
18.751
0.2%
19.151
0.2%
19.31
0.2%
ValueCountFrequency (%)
46.581
0.2%
45.721
0.2%
41.761
0.2%
40.341
0.2%
38.81
0.2%
37.711
0.2%
37.411
0.2%
37.241
0.2%
36.461
0.2%
36.061
0.2%

alcohol
Real number (ℝ≥0)

ZEROS

Distinct249
Distinct (%)53.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17.04439394
Minimum0
Maximum147.19
Zeros110
Zeros (%)23.8%
Negative0
Negative (%)0.0%
Memory size3.7 KiB
2022-11-01T16:24:03.792656image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.51
median7.51
Q323.8925
95-th percentile66.8495
Maximum147.19
Range147.19
Interquartile range (IQR)23.3825

Descriptive statistics

Standard deviation24.48105869
Coefficient of variation (CV)1.43631148
Kurtosis6.421109969
Mean17.04439394
Median Absolute Deviation (MAD)7.51
Skewness2.312698937
Sum7874.51
Variance599.3222347
MonotonicityNot monotonic
2022-11-01T16:24:04.022657image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0110
 
23.8%
2.0616
 
3.5%
0.518
 
1.7%
14.45
 
1.1%
43.25
 
1.1%
11.115
 
1.1%
8.235
 
1.1%
8.335
 
1.1%
4.114
 
0.9%
3.814
 
0.9%
Other values (239)295
63.9%
ValueCountFrequency (%)
0110
23.8%
0.191
 
0.2%
0.261
 
0.2%
0.372
 
0.4%
0.518
 
1.7%
0.61
 
0.2%
0.682
 
0.4%
0.691
 
0.2%
0.742
 
0.4%
0.861
 
0.2%
ValueCountFrequency (%)
147.191
0.2%
145.291
0.2%
1441
0.2%
120.031
0.2%
109.81
0.2%
1081
0.2%
100.321
0.2%
97.21
0.2%
92.621
0.2%
90.931
0.2%

age
Real number (ℝ≥0)

HIGH CORRELATION

Distinct49
Distinct (%)10.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean42.81601732
Minimum15
Maximum64
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.7 KiB
2022-11-01T16:24:04.247650image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum15
5-th percentile17
Q131
median45
Q355
95-th percentile62
Maximum64
Range49
Interquartile range (IQR)24

Descriptive statistics

Standard deviation14.60895644
Coefficient of variation (CV)0.3412030675
Kurtosis-1.01622901
Mean42.81601732
Median Absolute Deviation (MAD)12
Skewness-0.3817342585
Sum19781
Variance213.4216084
MonotonicityNot monotonic
2022-11-01T16:24:04.473650image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=49)
ValueCountFrequency (%)
1620
 
4.3%
5817
 
3.7%
1717
 
3.7%
6116
 
3.5%
5916
 
3.5%
5516
 
3.5%
6015
 
3.2%
4514
 
3.0%
5314
 
3.0%
4914
 
3.0%
Other values (39)303
65.6%
ValueCountFrequency (%)
153
 
0.6%
1620
4.3%
1717
3.7%
188
 
1.7%
192
 
0.4%
206
 
1.3%
213
 
0.6%
232
 
0.4%
246
 
1.3%
254
 
0.9%
ValueCountFrequency (%)
6413
2.8%
638
1.7%
6212
2.6%
6116
3.5%
6015
3.2%
5916
3.5%
5817
3.7%
578
1.7%
569
1.9%
5516
3.5%

chd
Categorical

Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size3.7 KiB
0
302 
1
160 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters462
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row0
4th row1
5th row1

Common Values

ValueCountFrequency (%)
0302
65.4%
1160
34.6%

Length

2022-11-01T16:24:04.670656image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-01T16:24:04.824656image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
0302
65.4%
1160
34.6%

Most occurring characters

ValueCountFrequency (%)
0302
65.4%
1160
34.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number462
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0302
65.4%
1160
34.6%

Most occurring scripts

ValueCountFrequency (%)
Common462
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0302
65.4%
1160
34.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII462
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0302
65.4%
1160
34.6%

Interactions

2022-11-01T16:23:57.476651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:43.983656image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:45.730652image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:47.342686image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:48.973652image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:50.722651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:52.387688image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:54.080652image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:55.829656image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:57.636655image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:44.174651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:45.903656image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:47.510650image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:49.140657image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:50.891691image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:52.566654image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:54.244650image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:56.000656image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:57.805650image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:44.459650image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:46.083655image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:47.701653image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:49.319693image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:51.075687image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:52.747656image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:54.422653image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:56.186650image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:57.973656image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:44.640650image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:46.262649image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:47.876650image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:49.619657image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:51.259657image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:52.935656image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:54.598651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:56.367695image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:58.159688image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:44.822690image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:46.441688image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:48.068651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:49.800655image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:51.443691image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:53.132650image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:54.781656image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:56.554651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:58.336690image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:45.010658image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:46.624691image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:48.253652image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:49.998656image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:51.631657image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:53.325651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:54.967651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:56.744656image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:58.510656image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:45.197650image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:46.808651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:48.443688image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:50.185651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:51.816652image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:53.511690image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:55.157650image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:56.931649image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:58.676656image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:45.372649image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:46.988688image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:48.623651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:50.368656image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:51.999688image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:53.699658image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:55.336651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:57.112651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:58.851654image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:45.559649image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:47.174655image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:48.806650image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:50.552656image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:52.200658image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:53.909651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:55.515651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-01T16:23:57.309651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2022-11-01T16:24:04.970650image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Auto

The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.
2022-11-01T16:24:05.237651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-01T16:24:05.504688image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-01T16:24:05.762656image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-01T16:24:05.991657image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-11-01T16:24:06.200651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-11-01T16:23:59.163650image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-01T16:23:59.505651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

namessbptobaccoldladiposityfamhisttypeaobesityalcoholagechd
0116012.005.7323.11Present4925.3097.20521
121440.014.4128.61Absent5528.872.06631
231180.083.4832.28Present5229.143.81460
341707.506.4138.03Present5131.9924.26581
4513413.603.5027.78Present6025.9957.34491
561326.206.4736.21Present6230.7714.14450
671424.053.3816.20Absent5920.812.62380
781144.084.5914.60Present6223.116.72581
891140.003.8319.40Present4924.862.49290
9101320.005.8030.96Present6930.110.00531

Last rows

namessbptobaccoldladiposityfamhisttypeaobesityalcoholagechd
4524541545.533.2028.81Present6126.1542.79420
4534551241.607.2239.68Present3631.500.00511
4544561460.644.8228.02Absent6028.118.23391
4554571282.242.8326.48Absent4823.9647.42271
4564581700.404.1142.06Present5633.102.06570
4574592140.405.9831.72Absent6428.450.00580
4584601824.204.4132.10Absent5228.6118.72521
4594611083.001.5915.23Absent4020.0926.64550
4604621185.4011.6130.79Absent6427.3523.97400
4614631320.004.8233.41Present6214.700.00461